Fault detection in instruction translations

ABSTRACT

In one embodiment, a method for identifying and replacing code translations that generate spurious fault events includes detecting, while executing a first native translation of target instruction set architecture (ISA) instructions, occurrence of a fault event, executing the target ISA instructions or a functionally equivalent version thereof, determining whether occurrence of the fault event is replicated while executing the target ISA instructions or the functionally equivalent version thereof, and in response to determining that the fault event is not replicated, determining whether to allow future execution of the first native translation or to prevent such future execution in favor of forming and executing one or more alternate native translations.

BACKGROUND

Some computing systems implement translation software to translateportions of target instruction set architecture (ISA) instructions intonative instructions that may be executed more quickly and efficientlythrough various optimization techniques such as combining, reorganizing,and eliminating instructions. More particularly, in transactionalcomputing systems that have the capability to speculate and rollbackoperations, translations may be optimized in ways that potentiallyviolate the semantics of the target ISA. Due to such optimizations, oncea translation has been generated, it can be difficult to distinguishwhether events (e.g., architectural fault such as a page violation)encountered while executing a translation are architecturally valid orare spuriously created by over-optimization of the translation.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 schematically shows an example computing system in accordancewith an embodiment of the present disclosure.

FIG. 2 shows an example of a trap mechanism for pausing execution inorder to determine whether a fault event is spuriously created by atranslation.

FIG. 3 shows an example of a counter mechanism for pausing execution inorder determine whether a fault event is spuriously created by atranslation.

FIG. 4 shows an example of a method for identifying and replacing codetranslations that generate spurious fault events in accordance with anembodiment of the present disclosure.

DETAILED DESCRIPTION

The present disclosure provides a mechanism for optimizing nativetranslations of corresponding non-native code portions, such as targetinstruction set architecture (ISA) code portions. The intelligentgeneration of translations, and the optimization thereof, may be handledby translation software, which may be included as part of a softwarelayer that provides an interface between an ISA and a processor core.More particularly, the present disclosure provides a fault narrowingmechanism that identifies and replaces code translations that generatespurious fault events (e.g., architectural faults). As discussed above,in some cases, a translation may be aggressively or overly optimizedsuch that the translation generates spurious fault events. Note that“spurious” means that if the corresponding target ISA code or afunctional equivalent thereof were executed, then the fault event wouldnot occur. In other cases, a fault event may be generated by the targetISA code. The mechanism determines whether a fault event encountered ina translation is generated spuriously by the translation, for exampledue to over-optimization of the translation, and if it is determinedthat the fault event was spuriously caused by the translation, itgenerates a different translation.

In one example, the translation software redirects execution to aninstruction pointer (IP) of a native translation in lieu ofcorresponding target ISA code by the processor core. The nativetranslation may be executed without using a hardware decoder located onthe processor core. Note that when this disclosure refers to execution“without using the hardware decoder,” that language may still encompassminor or trivial uses of decode logic in hardware while a translation isbeing executed. Circumventing the hardware decoder (i.e., by executing atranslation) in many cases will improve speed of execution, reduce powerconsumption, and provide various other benefits. During execution of thenative translation, a fault may be encountered. At this point, it isunknown whether the fault is an actual architectural event or if it isan artifact of the way that the code has been optimized in thetranslation. As such, execution is rolled back to a committed state(e.g., through a checkpoint mechanism), and a different version of codecorresponding to the translation that does not produce the artifactevent is executed. In one example, the alternate version of the codecorresponding to the translation is target ISA code that is decoded by ahardware decoder into native instructions. If the fault is encounteredduring execution of the alternate code, then it is concluded that thetranslation itself was not the cause of the fault. If the fault is notencountered during execution of the alternate code, then it is concludedthat the translation generated the artifact, and it is determinedwhether to allow future execution of the native translation or toprevent such future execution in favor of forming and executing one ormore alternate native translations. In some embodiments, the translationis reformed in a different way, and the reformed translation is executedsubsequently. In one example, the translation is reformed with feweroptimizations so as not to cause the fault during execution.

By using this mechanism, a translation can be aggressivelyover-optimized, then quickly narrowed if necessary using the hardwaredecoder to get to a translation that is suitably optimized to beexecuted without generating fault events. Implementations without thismechanism would find the overhead of narrowing or re-optimizing to behigh enough that translations would tend to be overly conservative orunder-optimized to avoid the narrowing process. For example, a softwareinterpreter may be adequate to isolate an architectural event or lackthereof, but would be obtrusively slow for narrowing and re-optimizingas the software interpreter can require hundreds of native instructionsto emulate a single target ISA instruction.

FIG. 1 shows aspects of an example micro-processing and memory system100 (e.g., a central processing unit or graphics processing unit of apersonal computer, game system, smartphone, etc.) including processorcore 102. Although the illustrated embodiment includes only oneprocessor core, it will be appreciated that the micro-processing systemmay include additional processor cores in what may be referred to as amulti-core processing system. Microprocessor core/die 102 variouslyincludes and/or may communicate with various memory and storagelocations 104. In some cases, it will be desirable to allocate a portion(sometimes referred to as a “carveout”) of memory as secure and privatesuch that it is invisible to the user and/or instruction setarchitecture (ISA) code 106. Various data and software may run from,and/or be stored in said allocation, such as software layer 108 andrelated software structures. As will be discussed in greater detailbelow, the software layer may be configured to generate, optimize, andstore translations of ISA code 106, and further to manage and interactwith related hardware on core 102 to determine whether translations aresuitably optimized (e.g., the translations do not generate faults orother artifacts).

Memory and storage locations 104 may include L1 processor cache 110, L2processor cache 112, L3 processor cache 114, main memory 116 (e.g., oneor more DRAM chips), secondary storage 118 (e.g., magnetic and/oroptical storage units) and/or tertiary storage 120 (e.g., a tape farm).Processor core 102 may further include processor registers 121. Some orall of these locations may be memory-mapped, though in someimplementations the processor registers may be mapped differently thanthe other locations, or may be implemented such that they are notmemory-mapped. It will be understood that the memory/storage componentsare listed above in increasing order of access time and capacity, thoughthere are possible exceptions. In some embodiments, a memory controllermay be used to handle the protocol and provide the signal interfacerequired of main memory 116, and, typically, to schedule memoryaccesses. The memory controller may be implemented on the processor dieor on a separate die. It is to be understood that the locations setforth above are non-limiting and that other memory/storage locations maybe used instead of, or in addition to, those described above withoutdeparting from the scope of this disclosure.

Microprocessor 102 includes a processing pipeline which typicallyincludes one or more of fetch logic 122, decode logic 124 (referred toherein as a hardware decoder or hardware decode logic (HWD)), executionlogic 126, mem logic 128, and writeback logic 130. Note that one or moreof the stages in the processing pipeline may be individually pipelinedto include a plurality of stages to perform various associatedoperations. It should be understood that these five stages are somewhatspecific to, and included in, a typical RISC implementation. Moregenerally, a microprocessor may include fetch, decode, and executionlogic, with mem and writeback functionality being carried out by theexecution logic. The present disclosure is equally applicable to theseand other microprocessor implementations, including hybridimplementations that may use VLIW instructions and/or other logicinstructions.

Fetch logic 122 retrieves instructions from one or more of memorylocations 104 (e.g., unified or dedicated L1 caches backed by L2-L3caches and main memory). In some examples, instructions may be fetchedand executed one at a time, possibly requiring multiple clock cycles.

Microprocessor 102 is configured to execute instructions, via executionlogic 126. Such instructions are generally described and defined by anISA that is native to the processor, which may be generated and/orexecuted in different modes of operation of the microprocessor. A firstmode (referred to herein as the “hardware decoder mode”) of executionincludes utilizing the HWD 124 to receive and decode (e.g., by parsingopcodes, operands, and addressing modes, etc.) target ISA or non-nativeinstructions of ISA code 106 into native instructions for execution viathe execution logic. It will be appreciated that the native instructionsdispatched by the HWD may be functionally equivalent to the non-nativeinstructions, in that execution of either type of instructions achievesthe same final result or outcome.

A second mode (referred to herein as the “translation mode”) ofexecution includes retrieving and executing native instructions withoutuse of the HWD. A native translation may cover and provide substantiallyequivalent functionality for any number of portions of correspondingtarget ISA or non-native ISA code 106. The corresponding nativetranslation is typically optimized to some extent by the translationsoftware relative to the corresponding non-native code that would bedispatched by the HWD. However, it will be understood that a variety ofoptimizations and levels of optimization may be employed.

A third mode (referred to herein as “software interpretation mode”) ofexecution includes utilizing a software interpreter 134 located in thesoftware layer 108 to execute target ISA code one instruction at a timeby translating the target ISA instruction into corresponding nativeinstructions.

Typically, translation mode provides the fastest and most efficientoperation out of the above described execution modes. However, there maybe substantial overhead costs associated with generating an optimizedtranslation of target ISA instructions. Accordingly, a translation maybe generated for portions of target ISA code that are executedfrequently or consume substantial processing time, such as frequentlyused or “hot” loops or functions in order to control such translationoverhead. In one example, a translation may be generated for a portionof target ISA code in response to the portion of code being executed anumber of times that is greater than a threshold value.

Hardware decoder mode may be slower or less efficient than translationmode and faster or more efficient than software interpretation mode. Forexample, hardware decoder mode may be used to execute portions of targetISA code that do not have corresponding translations. As anotherexample, hardware decoder mode may be used to determine whether or not atranslation is over-optimized based on encountering a fault duringexecution of a translation as will be discussed in further detail below.

Software interpretation mode may be used in corner cases or otherunusual/rare circumstances, such as to isolate a fault or lack of afault. The software interpretation mode may be used least frequently ofthe above described modes of operation, because the softwareinterpretation mode may be substantially slower than the other modes ofoperation. For example, software interpretation mode may requirehundreds of native instructions to emulate a single target ISAinstruction.

For the sake of clarity, the native instructions output by the HWD inhardware decoder mode will in some cases be referred to asnon-translated instructions, to distinguish them from the nativetranslations that are executed in the translation mode without use ofthe HWD.

Native translations may be generated in a variety of ways. As discussedabove, due to the high overhead of generating translations, in someembodiments, code portions of non-native ISA code may be profiled inorder to identify whether and how those code portions should be includedin new or reformed translations. When operating in hardware decodermode, the system may dynamically change and update a code portionprofile in response to the use of the HWD to execute a portion ofnon-native ISA code. For example, profiled code portions may beidentified and defined by taken branches. This is but one example,however, and any suitable type of code portion associated definition maybe used.

In certain embodiments, the code portion profile is stored in an on-coremicro-architectural hardware structure (e.g., on core 102), to enablerapid and lightweight profiling of code being processed with the HWD.For example, the system may include a branch count table (BCT) 136 and abranch history table (BHT) 138 each including a plurality of recordscontaining information about code portions of non-native ISA code 106encountered by the HWD as branch instructions are processed. In general,the BCT tracks the number of times a branch target address isencountered, while the BHT records information about the taken branchwhen a branch target address is encountered. Furthermore, the BCT isused to trigger profiling for translation upon saturation of aparticular code portion. For example, the BCT may be used to determinewhether a code portion has been executed a number of times that exceedsa threshold value, which triggers reforming of a correspondingtranslation.

As the code portions of non-native ISA code are processed by HWD,records may be dynamically added to BCT and BHT. For example, as the HWDprocesses taken branches leading to a branch target address, a recordfor that branch target address is added to the BCT and an initial valueis inserted into a counter associated with the record. Alternatively, ifa record already exists for the target address, the counter isincremented or decremented, as appropriate to the implementation. Assuch, the system may include micro-architectural logic for adding andupdating records in the BCT and the BHT. This logic may be a distinctcomponent or distributed within various components of the processingpipeline, though typically this logic will be operatively coupledclosely with the HWD since it is the use of the HWD that results inchanges to the BCT and the BHT.

From time to time, the records of BCT and/or BHT may be sampled andprocessed, for example by a summarizer 140 of software layer 108. Asdescribed above, the software layer may reside in a secure/privatememory allocation of storage locations 104 that is accessible bymicroprocessor 102 during execution of native ISA instructions. In otherwords, such an allocation may be inaccessible by ISA code.

The summarizer may be implemented as a lightweight event handler that istriggered when a record in the BCT produces an event (e.g., the counterfor the record saturates). In other words, the BCT produces an event,and the summarizer handles the event (e.g., by sampling and processingrecords in the BHT). Each counter maintained in the BCT for a targetaddress is used to control how many times the associated code portionwill be encountered before an event is taken for that code portion.

The summarizer identifies flow into, out of, and/or between codeportions when using the hardware decoder. Furthermore, the summarizeridentifies one or more non-translated code portions to be included in anew native translation by producing a summarized representation (e.g., acontrol flow graph) of code portion control flow involving the HWD. Forexample, the sampling and processing by the summarizer may be used togenerate and update a meta branch history table (MBHT) 142 in andbetween non-native code portions processed by the HWD. It will beappreciated that information about code portions and control flow may berepresented in any suitable manner, data structure, etc. The informationin the MBHT is subsequently consumed by a region former 144, which isresponsible for forming new translations of non-native ISA code. Onceformed, translations may be stored in one or more locations (e.g., atrace cache 146 of software layer 108). The region former may employvarious optimization techniques in creating translations, including, butnot limited to, reordering instructions, renaming registers,consolidating instructions, removing dead code, unrolling loops, etc. Itwill be understood that these translations may vary in length and theextent to which they have been optimized. For example, the region formermay vary the aggressiveness at which a translation is optimized in orderto strike a balance between increasing performance and generatingspurious architectural events or artifacts during execution. It will beappreciated that the structures stored in the software layer may beincluded in or collectively referred to herein as a translation manageror as translation management software.

During operation, the existence of a translation may be determined usingan on-core hardware redirector 132 (a.k.a., a THASH). The hardwareredirector is a micro-architectural structure that includes addressinformation or mappings sufficient to allow the processing pipeline toretrieve and execute a translation or a portion thereof associated witha non-native portion of ISA code via address mapping. Specifically, whenthe processing pipe branches to a target address of a non-native portionof ISA code, the target address is looked up in the THASH. Over time,translations that are frequently and/or recently requested are indexedby, and incorporated into, the hardware redirector. Each entry in thehardware redirector is associated with a translation, and providesredirection information that enables the microprocessor, during a fetchoperation for a selected code portion, to cause execution to beredirected away from that code portion and to its associatedtranslation. In order to save on processor die area and to provide rapidlookups, the hardware redirector may be of limited size, and it istherefore desirable that it be populated with entries providingredirection for the most “valuable” translations (e.g., translationsthat are more frequently and/or recently used). Accordingly, thehardware redirector may include usage information associated with theentries. This usage information varies in response to the hardwarestructure being used to redirect execution, and thus the entries aremaintained in, or evicted from, the hardware redirector based on thisusage information.

In the event of a hit in the THASH, the lookup returns the address of anassociated translation (e.g., translation stored in trace cache 146),which is then executed in translation mode (i.e., without use of HWD124). Alternatively, in the event of a miss in the THASH, the portion ofcode may be executed through a different mode of operation and one ormore of the mechanisms described above may be usable to generate atranslation. The THASH lookup may therefore be usable to determinewhether to add/update records in BCT and BHT. In particular, a THASH hitmeans that there is already a translation for the non-native target codeportion, and there is thus no need to profile execution of that portionof target code in hardware decoder mode. Note that the THASH is merelyone example of a mechanism for locating and executing translations, andit will be appreciate that the processor hardware and/or software mayinclude other mechanisms for locating and executing translations withoutdeparting from the scope of the present description.

Throughout operation, a state of the microprocessor (e.g., registers 121and/or other suitable states) may be checkpointed or stored to preservethe state of the microprocessor while a non-checkpointed working stateversion of the microprocessor speculatively executes instructions. Forexample, the state of the microprocessor may be checkpointed whenexecution of an instruction (or bundle, code portion of a translation,etc.) is completed without encountering an architectural event,artifact, exception, fault, etc. For example, an architectural event(e.g., a fault event) may include a page violation, a memory alignmentviolation, a memory ordering violation, a break point, execution of anillegal instruction, etc. If a fault event is encountered duringexecution, then the instruction may be rolled back, and state of themicroprocessor may be restored to the checkpointed state. Then operationmay be adjusted to handle the fault event. For example, themicroprocessor may operate in hardware decoder mode and mechanisms fordetermining whether the encountered event is an artifact of thetranslation may be employed. In one example, the decode logic isconfigured to manage checkpointing/rollback/restore operations. Althoughit will be appreciated that in some embodiments a different logical unitmay control such operations. In some embodiments,checkpointing/rollback/restore schemes may be employed in connectionwith the memory and storage locations 104 in what may be generallyreferred to as transactional memory. In other words, microprocessor 102may be a transaction-based system.

Furthermore, during execution of a native translation, the executionlogic may be configured to detect occurrence of a fault event in thenative translation. Since at the time of encountering the fault event,it may not be known whether or not the fault event is an artifactgenerated due to a particular way in which the native translation wasformed, the translation manager causes the code portion to be executeddifferently. For example, the target ISA instructions or a functionallyequivalent version thereof may be executed without executing the nativetranslation to determine whether the fault event was a product of thenative translation.

If a fault event is encountered in the translation, then the translationmanager may note the IP boundaries of the translation before executionof the translation is rolled back. In some cases, the IP boundaries mayinclude one contiguous portion of target ISA code. In other cases, theIP boundaries may include multiple non-contiguous portions of target ISAcode (e.g., if the translation was formed including a target ISA branchthat was assumed to be taken when the translation was generated). The IPboundaries may be used during execution of the target ISA instructionsor a functionally equivalent version thereof to determine whether afault event occurs in the code portion corresponding to the nativetranslation.

In one example, the system may operate in hardware decoder mode toproduce a functional equivalent of the target ISA instructions. Inparticular, the HWD receives target ISA instructions starting at the IPboundary corresponding to the beginning of the native translation,decodes them into native instructions, and dispatches the nativeinstruction to the execution logic for execution. The nativeinstructions may be executed by the execution logic until the faultevent is encountered again, or execution leaves the code portioncorresponding to the native translation (e.g., the IP is beyond the IPboundary corresponding to the end of the translation). Variousmechanisms for determining whether execution has left the code portioncorresponding to the native translation may be employed during operationin hardware decoder mode. Several non-limiting examples of suchmechanisms are discussed in further detail below with reference to FIGS.2 and 3.

If the event is encountered during execution of the target ISAinstruction or their functional equivalent (e.g., in the hardwaredecoder mode), it can be assumed that the event is an architecturalfault that was not created by the translation, and redirection ofcontrol flow to the architectural exception vector is performed wherecontrol is passed to the translation manager or other architecturalevent handling logic to correct the architectural event or provide otherevent handling operations.

If execution leaves the translation without encountering the faultevent, then it can be assumed that the native translation spuriouslycaused the fault event. In other words, the translation managerdetermines that the fault event is not replicated during execution ofthe target ISA instructions or the functionally equivalent versionthereof. In response to determining that the fault event is notreplicated, the translation manager is configured determine whether toallow future execution of the native translation or to prevent suchfuture execution in favor of forming and executing one or more alternatenative translations. Note that a future execution of the nativetranslation may include any execution subsequent to determining that thenative translation spuriously caused the fault event. The nativetranslation may be prevented from being executed in order to reduce thelikelihood of the fault event from occurring during subsequentexecutions of the target ISA instructions or the functionally equivalentversion thereof. In some embodiments, the determination whether to allowfuture execution of the native translation or to prevent such futureexecution may include forming and executing the one or more alternatetranslations upon determining that a performance cost associated withforming the one or more alternate translations is less than aperformance cost associated with continuing to execute the first nativetranslation, executing the target ISA instructions or a functionallyequivalent version thereof, without executing the first nativetranslation, or a combination thereof. It will be appreciated that theperformance costs may be calculated in any suitable manner withoutdeparting from the scope of the present disclosure.

In some embodiments, the native translation may be prevented from beingexecuted immediately after determining that the native translationspuriously caused the fault event such that the translation is notexecuted again. For example, when the code portion corresponding to thenative translation is encountered subsequent to the determination, thesystem may operate in hardware decoder mode to execute the code portioninstead of executing the native translation. As another example, adifferent translation may be executed instead of the native translation.

In some embodiments, the native translation may be executed one or moretimes subsequent to the determination before the native translation isprevented from being executed. For example, the native translation maybe executed subsequently in order to determine if the native translationspuriously causes any different faults. In one particular example, thenative translation is not prevented from being executed until a firstfault and a second fault are encountered a designated number of times asa result of executing the native translation. In other words, the nativetranslation may be repeatedly executed until it can be assumed with alevel of confidence that the native translation is the cause of a numberof different faults before execution of the native translation isprevented.

In some cases, the system may operate in software interpreter modeinstead of hardware decoder mode in response to encountering a faultevent during execution of the native translation (e.g., to handle ofcorner cases). As discussed above, hardware decoder mode may bepreferred over software interpreter mode for fault narrowing operationbecause the software interpreter mode may be significantly slower toexecute the target ISA instructions. For example, the softwareinterpreter may take over one hundred times longer to execute aninstruction than the HWD may take to execute the same instruction.

In some embodiments, the translation manager may be configured togenerate an updated or reformed translation of the target ISAinstructions that is optimized differently based on encountering anartifact or fault event in the first translation. In one example, thereformed translation is optimized differently so as not to generate thearchitectural event. For example, the updated translation may includefewer optimizations than the previous translation, such as lesscombinations, reorganizations, and/or eliminations of target ISAinstructions. Further, the execution logic may be configured to, uponsubsequently encountering the code portion of the target ISAinstructions, execute the updated or reformed translation instead of theprevious translation that spuriously caused the fault event.

Since there may be substantial overhead costs associated with generatingan optimized translation of target ISA instructions, in someembodiments, the translation manager may be configured to track activityrelated to the translation subsequent to determining that the fault wasan artifact of the translation, and determine if or when it would besuitable to update the translation. In one example, the translationmanager is configured to increment a counter associated with the nativetranslation subsequent to determining that the fault event is anartifact of the translation. Further, the translation manager maygenerate the updated translation of the target ISA instructionsresponsive to the counter saturating or becoming greater than athreshold value. The counter may be employed to track or count a varietyof different factors, events, parameters, or execution propertiesassociated with the translation that spuriously caused the fault event.Non-limiting examples of these factors that the counter may trackinclude time, a number of translation executions, a number oftranslation executions that spuriously cause a fault event, a number oftranslation execution that spuriously cause a number of different faultevents. In some embodiments the counter may include a decision functionthat includes a combination of these factors.

It will be appreciated that the counter may be used to track anysuitable parameter or event associated with the translation in order todetermine if or when to reform the translation. Moreover, it will beappreciated that the counter is merely one example of a trackingmechanism, and any suitable mechanism may be employed to decide when toreform the translation.

In some embodiments, the translation manager may be configured to reformthe translation (or generate a new translation) of only a subset of thetarget ISA instructions that were represented by the translation thatspuriously created the fault. In some embodiments, the translationmanager may be configured to generate a plurality of translations thatspan the target ISA instructions that were represented by thetranslation that spuriously created the fault.

FIGS. 2 and 3 show examples of various mechanisms that may be employedto pause execution during operation in hardware decoder mode in order todetermine whether the code portion corresponding to the translation isexecuted without encountering the event, which may be used to determinewhether an event is spuriously created by the translation. FIG. 2 showsan example of a mechanism 200 that causes execution to be pausedresponsive to encountering a target of a branch instruction that isdispatched by the HWD. In particular, when an event is encountered intranslation mode, execution is rolled back to the beginning of the IPboundary 210 of the translation. The IP boundary defines the code regionof the translation by denoting the IP at the beginning of thetranslation and the IP at the end of the translation. The translationmanager calls the HWD to operate in hardware decoder mode with aparticular jump instruction that includes a “sticky bit” 202 that is setbased on encountering the event. By setting the sticky bit in the jumpinstruction that invokes the HWD, each branch causes a field 204 to beset that is associated with the branch target. The set bit is recognizedupon execution of the branch target causing execution in hardwaredecoder mode to be paused. The set bit is cleared and control is passedfrom the HWD to the translation manager. The translation managerdetermines whether the IP is within the IP boundary of the code portioncorresponding to the translation. If the IP is beyond the IP boundary ofthe translation, then the event was not encountered in the code portionat issue and it can be assumed that the event was an artifact of thetranslation, and the translation may need to be reformed in a differentmanner and the sticky bit is cleared. If the IP is within the IPboundary of the translation, then control is passed back to the HWD andexecution in hardware decoder mode continues until another branch targethaving a set bit is encountered or the event is encountered. If theevent is encountered, then it can be assumed that the event is not anartifact of the translation and the translation may not beover-optimized and the sticky bit is cleared.

The above described mechanism may be referred to as a “branch callbacktrap” because each time a branch target is encountered with a set bit,execution in hardware decoder mode is paused and control is passed tothe translation manager. In other words, the sticky bit is the mechanismby which the translation manager gets passed control from hardwaredecoder mode. Note that when the HWD is called for operation other thanwhen an event is encountered, the sticky bit in the jump instruction maybe cleared to suppress the branch callback trap mechanism.

In some microprocessor implementations that include a hardwareredirector or THASH that is accessed by the HWD to check for atranslation, access to the THASH by the HWD is disabled or matches inthe THASH are inhibited based on the event being encountered. In oneexample, access to the THASH is disabled when the sticky bit in the jumpinstruction that calls the HWD is set. By suppressing the lookup of theTHASH, execution is not redirected to the translation so that executionin hardware decoder mode may be performed to determine whether the eventis generated by the translation. In other words, access to the THASH isdisabled when executing the target ISA instructions without executingthe native translation.

FIG. 3 shows an example of a counter mechanism 300 that causes executionto be paused responsive to a counter expiring or elapsing. Similar tothe above described example, when the HWD is called based onencountering an event during operation in translation mode, a counter302 may be set, for example by setting a bit in a particular jumpinstruction that calls the HWD. During execution in hardware decodermode, the counter counts down and when the counter expires execution ispaused and control is passed to the translation manager. The translationmanager determines whether the IP 306 is within the IP boundary 308 ofthe code portion corresponding to the translation. If the IP is beyondthe IP boundary of the translation, then the event was not encounteredin the code portion at issue and it can be assumed that the event was anartifact of the translation, and the translation may need to be reformedin a different manner. If the IP is within the IP boundary of thetranslation, then control is passed back to the HWD and execution inhardware decoder mode continues until the counter expires again or theevent is encountered. If the event is encountered, then it can beassumed that the event is not an artifact of the translation and thetranslation may not be over-optimized.

It will be appreciated that the counter may be set to any suitableduration or may track any suitable execution property or parameter. Inone example, the counter may be set to for a designated number of clockcycles. In another example, the counter may be set for a designatednumber of instructions. In yet another example, the counter may expirein response to encountering a branch instruction. In still yet anotherexample, the counter may expire in response to encountering a designatednumber of branch instructions.

It will be appreciated that the above described mechanisms may beparticularly applicable to operation in hardware decoder mode, becausecontrol is passed from hardware (e.g., execution logic) to software(e.g., translation manager) when execution is paused to determinewhether execution has left the IP boundary of the code portion at issue.Moreover, such mechanisms may allow for execution to be pausedoccasionally in order to perform an IP boundary check that allows forfaster execution relative to an approach that checks after execution ofeach instruction.

FIG. 4 shows an example of a method 400 for optimizing a translation oftarget ISA instructions in accordance with an embodiment of the presentdisclosure. The method 400 may be implemented with any suitablesoftware/hardware, including configurations other than those shown inthe foregoing examples. In some cases, however, the process flows mayreference components and processes that have already been described. Forpurposes of clarity and minimizing repetition, it may be assumed thatthese components/processes are similar to the previously describedexamples.

At 402, the method 400 includes, detecting, while executing a firstnative translation of target ISA instructions, occurrence of a faultevent in the first native translation. The first native translation maybe executable to achieve substantially equivalent functionality asobtainable via execution of the target ISA instructions. In other words,the first native translation is designed such that execution of thefirst native translation should produce the same output as the targetISA instructions. In one example, the fault event includes one of a pageviolation, a memory alignment violation, a memory ordering violation, abreak point, and execution of an illegal instruction.

At 404, the method 400 includes decoding the target ISA instructionsinto functionally equivalent native instructions with a hardware decoderin response to detecting occurrence of the fault event while executingthe first native translation;

At 406, the method 400 includes executing the target ISA instructions ora functionally equivalent version thereof, where such execution isperformed without executing the first native translation.

At 408, the method 400 includes determining whether occurrence of thefault event is replicated while executing the target ISA instructions orthe functionally equivalent version thereof.

At 410, the method 400 includes in response to determining that thefault event is not replicated, determining whether to allow futureexecution of the first native translation or to prevent such futureexecution in favor of forming and executing one or more alternate nativetranslations.

At 412, the method 400 may optionally include in response to determiningthat the fault event is not replicated, forming one or more alternatenative translations of the target ISA instructions. The one or morealternate native translations may be executable to achieve substantiallyequivalent functionality as obtainable via execution of the target ISAinstructions. In some cases, the one or more alternative nativetranslations are optimized differently than the first native translationso as to avoid occurrence of the fault event that was encountered duringexecution of the first native translation. In some cases, the one ormore alternative native translations may include fewer optimizationsthan employed in the first native translation

At 414, the method 400 may optionally include executing the one or morealternate native translations upon subsequently encountering the targetISA instructions.

While the depicted method may be performed in connection with anysuitable hardware configuration, it will be appreciated thatmodifications, additions, omissions, and refinements may be made tothese steps in accordance with method descriptions included above anddescribed with references to the mechanisms, hardware, and systems shownin FIG. 1-3.

This written description uses examples to disclose the invention,including the best mode, and also to enable a person of ordinary skillin the relevant art to practice the invention, including making andusing any devices or systems and performing any incorporated methods.The patentable scope of the invention is defined by the claims, and mayinclude other examples as understood by those of ordinary skill in theart. Such other examples are intended to be within the scope of theclaims.

1. A method for identifying and replacing code translations thatgenerate spurious fault events, comprising: detecting, while executing afirst native translation of target instruction set architecture (ISA)instructions, occurrence of a fault event, the first native translationbeing executable to achieve substantially equivalent functionality asobtainable via execution of the target ISA instructions; decoding thetarget ISA instructions into functionally equivalent native instructionswith a hardware decoder in response to detecting occurrence of the faultevent while executing the first native translation; executing the targetISA instructions or a functionally equivalent version thereof, wheresuch execution is performed without executing the first nativetranslation; determining whether occurrence of the fault event isreplicated while executing the target ISA instructions or thefunctionally equivalent version thereof; and in response to determiningthat the fault event is not replicated, determining whether to allowfuture execution of the first native translation or to prevent suchfuture execution in favor of forming and executing one or more alternatenative translations.
 2. The method of claim 1, where determining whetherto allow or prevent future execution of the first native translationincludes forming and executing the one or more alternate translationsupon determining that a performance cost associated with forming the oneor more alternate translations is less than a performance costassociated with continuing to execute the first native translation. 3.The method of claim 1, further comprising incrementing a counterassociated with the first native translation, and where determiningwhether to allow or prevent future execution of the first nativetranslation includes preventing such execution and forming and executingthe one or more alternate translations in response to saturating thecounter.
 4. The method of claim 3, where the counter is incremented inresponse to determining that a fault event occurring during execution ofthe first native translation is not replicated when executing target ISAinstructions or a functionally equivalent version thereof.
 5. The methodof claim 1, further comprising forming one or more alternate nativetranslations to be executed instead of the first native translation,where the one or more alternative native translations are optimizeddifferently than the first native translation so as to avoid occurrenceof the fault event that was encountered during execution of the firstnative translation.
 6. The method of claim 5, where the one or morealternative native translations include fewer optimizations thanemployed in the first native translation.
 7. The method of claim 1,further comprising: pausing execution of the target ISA instructions orthe functionally equivalent version thereof responsive to encountering atarget of a branch instruction; determining whether an instructionpointer is within an instruction pointer boundary corresponding to thefirst native translation when execution is paused; and if theinstruction pointer is beyond the instruction pointer boundary whenexecution is paused, determining that occurrence of the fault event isnot replicated while executing the target ISA instructions or thefunctionally equivalent version thereof.
 8. The method of claim 7, wherethe target of the branch instruction includes a field having a bit thatis set responsive to detecting occurrence of the fault event whileexecuting the first native translation, and execution is pausedresponsive to encountering the set bit.
 9. The method of claim 8, wherethe microprocessor includes a hardware redirector that is accessed by ahardware decoder to check for a native translation corresponding to aportion of target ISA instructions, and where access to the hardwareredirector by the hardware decoder is disabled when executing the targetISA instructions or the functionally equivalent version thereof withoutexecuting the first native translation
 10. The method of claim 1,further comprising: setting a counter for execution of the target ISAinstructions or the functionally equivalent version thereof based on thefault event, pausing execution of the target ISA instructions or thefunctionally equivalent version thereof responsive to the counterexpiring; determining whether an instruction pointer is within aninstruction pointer boundary corresponding to the first nativetranslation when execution is paused; and if the instruction pointer isbeyond the instruction pointer boundary when execution is paused,determining that occurrence of the fault event is not replicated whileexecuting the target ISA instructions or the functionally equivalentversion thereof.
 11. A micro-processing and memory system comprising:memory configured to store target ISA instructions and a first nativetranslation executable to achieve substantially equivalent functionalityas obtainable via execution of the target ISA instructions; amicroprocessor including, execution logic configured to (1) detect,during execution of the first native translation, occurrence of a faultevent, (2) roll back execution of the first native translation inresponse to detecting occurrence of the fault event while executing thefirst native translation; a hardware decoder configured to decode thetarget ISA instructions into functionally equivalent native instructionsin response to detecting occurrence of the fault event while executingthe first native translation, where the execution logic is configured toexecute the target ISA instructions or a functionally equivalent versionthereof, where such execution is performed without executing the firstnative translation; and a translation manager configured to (1)determine whether occurrence of the fault event is replicated whileexecuting the target ISA instructions or the functionally equivalentversion thereof, and (2) in response to determining that the fault eventis not replicated, determine whether to allow future execution of thefirst native translation or to prevent such future execution in favor offorming and executing one or more alternate native translations.
 12. Thesystem of claim 11, where determining whether to allow or prevent futureexecution of the first native translation includes forming and executingthe one or more alternate translations upon determining that aperformance cost associated with forming the one or more alternatetranslations is less than a performance cost associated with continuingto execute the first native translation.
 13. The system of claim 11,where the execution logic is configured to increment a counterassociated with the first native translation, and where determiningwhether to allow or prevent future execution of the first nativetranslation includes preventing such execution and forming and executingthe one or more alternate translations in response to saturating thecounter.
 14. The system of claim 12, where the translation manager isconfigured to form one or more alternate native translations to beexecuted instead of the first native translation, where the one or morealternative native translations are optimized differently than the firstnative translation so as to avoid occurrence of the fault event that wasencountered during execution of the first native translation.
 15. Thesystem of claim 11, where the execution logic is configured to pauseexecution of the target ISA instructions or the functionally equivalentversion thereof responsive to encountering a target of a branchinstruction, and where the translation manager is configured to (1)determine whether an instruction pointer is within an instructionpointer boundary corresponding to the first native translation whenexecution is paused, and (2) if the instruction pointer is beyond theinstruction pointer boundary when execution is paused, determine thatoccurrence of the fault event is not replicated while executing thetarget ISA instructions or the functionally equivalent version thereof.16. The system of claim 15, where the target of the branch instructionincludes a field having a bit that is set responsive to detectingoccurrence of the fault event while executing the first nativetranslation, and execution is paused responsive to encountering the setbit
 17. The system of claim 11, where the translation manager isconfigured to set a counter for execution of the target ISA instructionsor the functionally equivalent version thereof based on detection of thefault event, where the execution logic is configured to pause executionof the target ISA instructions or the functionally equivalent versionthereof responsive to the counter expiring, where the translationmanager is configured to determine whether an instruction pointer iswithin an instruction pointer boundary corresponding to the first nativetranslation when execution is paused, and if the instruction pointer isbeyond the instruction pointer boundary when execution is paused,determine that occurrence of the fault event is not replicated whileexecuting the target ISA instructions or the functionally equivalentversion thereof.
 18. A method for identifying and replacing codetranslations that generate spurious fault events, comprising: detecting,while executing a first native translation of target instruction setarchitecture (ISA) instructions, occurrence of a fault event, the firstnative translation being executable to achieve substantially equivalentfunctionality as obtainable via execution of the target ISAinstructions; rolling back execution of the first native translation inresponse to detecting the fault event; decoding the target ISAinstructions into functionally equivalent native instructions with ahardware decoder in response to detecting occurrence of the fault eventwhile executing the first native translation, where targets of branchinstructions decoded by the hardware decoder include a field having abit that is set responsive to encountering the fault event; executingthe native instructions dispatched by the hardware decoder; pausingexecution of the native instructions responsive to encountering a setbit in the field of a target of a branch instruction; determiningwhether an instruction pointer is within an instruction pointer boundarycorresponding to the first native translation when execution is paused;if the instruction pointer is beyond the instruction pointer boundarywhen execution is paused, determining that occurrence of the fault eventis not replicated while executing the target ISA instructions or thefunctionally equivalent version thereof; and in response to determiningthat the fault event is not replicated, forming and executing one ormore alternate translations upon determining that a performance costassociated with forming the one or more alternate translations is lessthan a performance cost associated with continuing to execute the firstnative translation.
 19. The method of claim 18, where the one or morealternative native translations are optimized differently than the firstnative translation so as to avoid occurrence of the fault event that wasencountered during execution of the first native translation.
 20. Themethod of claim 19, where the one or more alternative nativetranslations include fewer optimizations than employed in the firstnative translation.